This project is about testing whether the weather has a say in the number of available parking spaces in Stavanger. The idea is built on the premise that it seam to be harder to find parking space at a public parking lot when the weather is considered bad. But do the data and the impression tell the same story?
I have no personal connectio to Stavanger. Stavanger is manly chosen for its publicly available parking data. Thanks to my professor in Data Science Øystein Myrland at university of tromsø https://uit.no/ansatte/person?p_document_id=41412 for the initial idea and inspiration for this project.
The available parking spaces goes down in bad weather.
The weather data is collected from:
Norwegian klima service senter: https://klimaservicesenter.no/observations/
Temperature and Precipitation from station id:SN44640.
Wind information from station id:SN44560.
The parking data is collected from: https://open.stavanger.kommune.no/dataset/stavanger-parkering/
More information on each parking lot: https://stavanger-parkering.no/parkering/p-hus/
Interactive map using “leaflet” library:
Feel free to inspect the map, click on the markers to see the name for the parking lot.
A schema of correlation between the parking lots and the different weather aspects. We use data every day between 08:00 and 18:00.
This does not paint a clear picture of a correlation. But if we say values over 0.3 is a week correlation we have at least one number above that (Siddis and Air temperature). To inspect further we do linear regression on the data.
Three interactive regression plots for respectively Air temprure, Precipitation and wind speed. Siddis is set as standard for the regression graphs because it shows the most promising values in the correlation table, but feel free to change and inspect the other parking lots as well. Note. Forum and Parketten have some problems/anomalies in its data (see time graphs bellow)
As of 17.12.20 we can see the data for the air temperature do seam to go the oppose direction from what I expected. The available parking seam to go down in higher temperature (except for Forum). Precipitation and wind speed do not seam to follow the same trend and some of them goes up and some goes down in higher/lower precipitation and wind speed. This can change as time goes on and we collect more data. The weather unfortunately for this project seam to have bean fairly stable. To inspect even deeper a time plot for both the parking data and weather is plotted bellow.
Note that the parking data for parking lot Forum and parketten have some anomolys
The source code can be found at: https://github.com/sso149/Bed-2056-Project
I chose to use python 3 for the data collection because i was most familiar with it at the time when i started this project. Python is also a good language to use in this scenario since we are collecting from the web and errors can occur. The exception clause in python make it easy to collect these errors and make sure we do not have a crash if/when a unexpected error occurs. (See code inside src_python)
I made my own scheduler in python3 instead of using soothing like cron in linux. This was manly because i was personally interested and i could easily used cron job instead since it runs on a linux server, The only advantage is that no alteration is required to run in another os. The scheduler is made to not drift regardless of the time it uses to finish one task. After one task is completed it calculates the amount it need to sleep to execute exactly at the right time instead of just sleeping a fixed time witch would cause a drift over time. For example for the parking data this means it collect the data exactly at minute 0-4-8-12-16-20-24-28-32-36-40-44-48-52-56 each hour.
The scraper is also written in python3 the scraper for parking and weather is separated in classes. Each of the classes have the state/information and functions to request, partition and then store the data. To start the scraper a simple start dunction is mase (see scraper.py) this sets up the schedulers and scrapers for both parking data and weather data and start them in separate threads. The parking data is collected every 4 minutes, and the weather is collected every morning at 05:00, all the data for the previous day is collected and contains the acctual recorded weather each hour of the day.
Web scraper
Since the web scraper just collects and insert the data in to files with no further test and corrections the data is far from perfect. There is are some missing values, duplicate rows and for two of the parking lots just completely wrong data sometimes. (see code inside src_R/wrangle.r)
The libraries used for the data preparation is manly: tidyverse and lubridate.